Goto

Collaborating Authors

 Chatham County











Winner-Take-AllColumnRowSamplingforMemory EfficientAdaptationofLanguageModel

Neural Information Processing Systems

By replacing the linear operation with our approximated one in transformers, we can achieve up to 2.7 peak memory reduction with almost no accuracy drop and enables up to6.4 larger batch size.